Penalty function maximization for large margin HMM training
نویسندگان
چکیده
We perform large margin training of HMM acoustic parameters by maximizing a penalty function which combines two terms. The first term is a scale which gets multiplied with the Hamming distance between HMM state sequences to form a multi-label (or sequence) margin. The second term arises from constraints on the training data that the joint log-likelihoods of acoustic and correct word sequences exceed the joint log-likelihoods of acoustic and incorrect word sequences by at least the multi-label margin between the corresponding Viterbi state sequences. Using the softmax trick, we collapse these constraints into a boosted MMI-like term. The resulting objective function can be efficiently maximized using extended Baum-Welch updates. Experimental results on multiple LVCSR tasks show a good correlation between the objective function and the word error rate.
منابع مشابه
Hierarchical Command Recognition Based on Large Margin Hidden Markov Models
The dominant role of Hidden Markov Models (HMMs) in automatic speech recognition (ASR) is not to be denied. At first, the HMMs were trained using the Maximum Likelihood (ML) approach, using the BaumWelch or Expectation Maximization algorithms (Rabiner, 1989). Then, discriminative training methods emerged, i.e. the Minimum Classification Error (Sha & Saul, 2007; Siohan et al., 1998), the Conditi...
متن کاملHierarchical Command Recognition Based on Large Margin Hidden Markov Models 5
The dominant role of Hidden Markov Models (HMMs) in automatic speech recognition (ASR) is not to be denied. At first, the HMMs were trained using the Maximum Likelihood (ML) approach, using the BaumWelch or Expectation Maximization algorithms (Rabiner, 1989). Then, discriminative training methods emerged, i.e. the Minimum Classification Error (Sha & Saul, 2007; Siohan et al., 1998), the Conditi...
متن کاملEntropy and Margin Maximization for Structured Output Learning
We consider the problem of training discriminative structured output predictors, such as conditional random fields (CRFs) and structured support vector machines (SSVMs). A generalized loss function is introduced, which jointly maximizes the entropy and the margin of the solution. The CRF and SSVM emerge as special cases of our framework. The probabilistic interpretation of large margin methods ...
متن کاملA Study of Duration High-Order Hidden Markov Models and Training Algorithms for Speech Recognition
The duration high-order hidden Markov model (DHO-HMM) can capture the dynamic evolution of a physical system more precisely than the first-order hidden Markov model (HMM). The relationship among DHO-HMM, high-order HMM (HO-HMM), hidden semi-Markov model (HSMM), and HMM is presented and discussed. We derived recursive forward and backward probability functions for the partial observation sequenc...
متن کاملTowards more accurate clustering method by using dynamic time warping
An intrinsic problem of classifiers based on machine learning (ML) methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008